[ES|QL] Lookup join and Inline stats support for query approximation#145980
[ES|QL] Lookup join and Inline stats support for query approximation#145980jan-elastic wants to merge 10 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
|
Hi @jan-elastic, I've created a changelog YAML for you. |
6726448 to
3bfcab1
Compare
|
Hi @jan-elastic, I've created a changelog YAML for you. |
9f4f8eb to
f743ed6
Compare
luigidellaquila
left a comment
There was a problem hiding this comment.
Thanks @jan-elastic, for what I can see it looks good. The test coverage is good and the logic looks consistent.
For me it's a 👍 , but I'd also ask @astefan and @julian-elastic for a feedback
470cecb to
d051b06
Compare
|
Hi @jan-elastic, I've created a changelog YAML for you. |
d9b1230 to
214ca31
Compare
|
Your change can change the plan, right? I don't see a single test where you actually verify that the plan is correct after your change. Please look at how we add golden tests and add a few, so we can make sure the plan is correct. |
| verify("ROW i=[1,2,3] | EVAL x=TO_STRING(i) | DISSECT x \"%{x}\" | STATS i=10*POW(PERCENTILE(i, 0.5), 2) | LIMIT 10"); | ||
| verify("FROM test | URI_PARTS parts = last_name | STATS scheme_count = COUNT() BY parts.scheme | LIMIT 10"); | ||
| verify("FROM test | REGISTERED_DOMAIN rd = last_name | STATS c = COUNT() BY rd.registered_domain | LIMIT 10"); | ||
| verify("FROM test | INLINE STATS COUNT() BY last_name | LIMIT 10"); |
There was a problem hiding this comment.
Can you please add some tests with views? I know it is not related to this PR, but you dont have any. Views might introduce fork, which is n-nary plan, and it might have issue with finding the first leaf.
There was a problem hiding this comment.
Thanks for pointing that out.
I'm currently working on supporting FORK queries. I will have a look at views then too. Is that okay?
|
|
||
| PhysicalPlan child = plan.child().transformUp(LeafExec.class, leaf -> new SampleExec(Source.EMPTY, leaf, plan.sampleProbability())); | ||
| // The only non-unary plans that are currently supported are Joins. | ||
| // At the moment, the left side of the join is the "expensive" side and |
There was a problem hiding this comment.
How do you guard against someone adding a new binary node in the future and your feature not working correctly? Should we be checking for supported n-nary nodes instead of blindly allowing all n-nary nodes?
There was a problem hiding this comment.
New nodes are not supported for approximation by default, see Approximation.SUPPORTED_COMMANDS.
When someone adds a new binary node to this allowlist, I expect them to make sure that node works end-to-end. Anyway, I'm working on FORK now, which probably requires touching this (join -> only sample first child, fork -> sample all children). So this code won't be there for long.
julian-elastic
left a comment
There was a problem hiding this comment.
Overall looks great! I left some comments, feel free to check in after you address them.
I've added a test to Regarding the golden tests: query approximation doesn't have any yet. I'll give a look at them and add them in a separate PR. |
214ca31 to
575db2c
Compare
| Row.class, | ||
| Sample.class, | ||
| SampledAggregate.class, | ||
| StubRelation.class, |
There was a problem hiding this comment.
Might be worth adding a comment mentioning this class specifically. This one is a placeholder node and it should pretty much disappear from the logical plan tree after the optimizations.
There was a problem hiding this comment.
This StubRelation survives the optimizations and reaches EsqlSession::executeSubPlans. There subplan execution substitutes them.
I will add a comment though: "Temporary node generated by INLINE STATS"
| InlineJoin.class, | ||
| Insist.class, | ||
| LocalRelation.class, | ||
| Join.class, |
There was a problem hiding this comment.
A bit surprised to see Join here and not LookupJoin.
There was a problem hiding this comment.
Me too.
It's caused by this surrogate:
https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/LookupJoin.java#L67-L71
Apparently it is "to deal with serialization & co".
Maybe it's better to remove this surrogate, add serialization to LookupJoin, and make Join abstract? 🤷 I can't oversee the consequences. WDYT?
There was a problem hiding this comment.
I think PR solves it: jan-elastic#3
WDYT?
There was a problem hiding this comment.
Looks like Join is kind of synonymous with LookupJoin: the surrogate replaces the LookupJoin by Join, and LookupJoin itself is not even in PlanWritables.
| assert sampleProbability < 1.0; | ||
|
|
||
| PhysicalPlan child = plan.child().transformUp(LeafExec.class, leaf -> new SampleExec(Source.EMPTY, leaf, plan.sampleProbability())); | ||
| // The only non-unary plans that are currently supported are Joins. |
There was a problem hiding this comment.
Also, maybe in the same idea with Julian's comment. These changes here assume joins (in general), while we only support lookup joins now, let's not assume that "join" = "lookup join" for now. I commented similarly above in the list of supported_commands, let's make sure what we support now is as restrictive and realistic as possible.
If we add support for other joins in the future and users start using it and then we realize the support for those future joins needs further design/consideration/testing, adding restrictions means breaking changes.
There was a problem hiding this comment.
I agree it should be as restrictive as possible.
I think the root problem is the issue above (LookupJoin's surrogate making it a general Join).
There was a problem hiding this comment.
There was a problem hiding this comment.
here's a quick attempt at fixing it (missing some bwc stuff):
#146213 (it's this PR + 3 more commits)
There was a problem hiding this comment.
Without deep analysis, making bigger changes to Join.java deserves its own PR. I wouldn't do it here.
I think all you need is "is this join a lookup join?", right? Yeah, we don't make this easy and natural to check. But! Practically, you can indeed check this by walking down the right hand side of the join, picking up the EsRelation, and checking that it has IndexMode.LOOKUP.
That's something we can easily add as a method on Join.java, and can be added to this PR. (We do this check all the time; it should've become a helper method a long time ago, we sillies.)
There was a problem hiding this comment.
Sorry, I didn't realize we're inside the local physical optimizer, d'oh.
In terms of physical plans, we luckily have LookupJoinExec and don't have a general JoinExec. (There's HashJoinExec for INLINE STATS). So I'm not sure what exactly the problem is right here. We could throw or assert on a list of allowed physical plans (actually that'd be a bit safer, as physical optimization could shove some weird stuff before the STATS); but the question seems to be around the list of allowed logical plans, right? Yeah, for those, my comment around identifying lookup joins stands.
There was a problem hiding this comment.
OK, I've added an assertion now. It's all good. Thanks
dd6607b to
1705aac
Compare
astefan
left a comment
There was a problem hiding this comment.
There are some interesting edge cases that must be explored imho.
The one that I did test came from this PR where pruning inline stats in some scenarios leads to code paths that are rarely explored.
In PruneColumns.pruneColumnsInAggregate method I think there is a "silent" inlineJoin=true in case that aggregates is a SampledAggregate and I don't think that's the intention.
Running
SET approximation={"rows":10000};
FROM employees
| INLINE STATS x = MAX(salary) WHERE false, c = COUNT(*) BY emp_no
| KEEP x, c
| SORT x, c
| LIMIT 3
stops the node for me with this exception
[ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [runTask-0] fatal error in thread [elasticsearch[runTask-0][search_coordination][T#1]], exiting java.lang.ExceptionInInitializerError
at org.elasticsearch.xpack.esql.optimizer.rules.logical.SubstituteApproximationPlan.apply(SubstituteApproximationPlan.java:27)
at org.elasticsearch.xpack.esql.optimizer.rules.logical.SubstituteApproximationPlan.apply(SubstituteApproximationPlan.java:20)
at org.elasticsearch.xpack.esql.rule.ParameterizedRuleExecutor.lambda$transform$0(ParameterizedRuleExecutor.java:29)
at org.elasticsearch.xpack.esql.rule.RuleExecutor.execute(RuleExecutor.java:128)
at org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizer.optimize(LogicalPlanOptimizer.java:137)
at org.elasticsearch.xpack.esql.session.EsqlSession.optimizedPlan(EsqlSession.java:1598)
I did also try a query from this other PR - #135011 - and got the same stacktrace as above. Maybe I missed something in my tests...
|
I can't reproduce your failure When running that query, I get the expected (tried both a locally running cluster, and CsvIT) |
1705aac to
4883422
Compare
🔍 Preview links for changed docs⏳ Building and deploying preview... View progress This comment will be updated with preview links when the build is complete. |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
No description provided.